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Abstract 

The viscosity dependence of the folding rates for four sequences (the 
native state of three sequences is a /5-sheet, while the fourth forms a a-helix) 
is calculated for off-lattice models of proteins. Assuming that the dynamics 
is given by the Langevin equation we show that the folding rates increase 
linearly at low viscosities rj, decrease as 1/rj at large rj and have a maximum 
at intermediate values. The Kramers theory of barrier crossing provides a 
quantitative fit of the numerical results. By mapping the simulation results 
to real proteins we estimate that for optimized sequences the time scale for 
forming a four turn a-helix topology is about 500 ns, whereas for /3-sheet it 
is about 10 /xs. 
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Based on several theoretical studies of minimal models of proteins a novel concep- 
tual framework for understanding the folding kinetics of proteins and, in general, 
biomolecules P has recently emerged. The basis of this new framework lies in the observa- 
tion that the polymeric nature of proteins together with the presence of conflicting energies 
(arising from the differences in the structural preferences of hydrophobic and hydrophilic 
residues) lead to topological frustration - a situation where structures, which are favorable 
on relatively short length scales, are in conflict with the global free energy minimum, namely, 
the native state of the protein. Due to topological frustration the underlying free energy 
landscape has, besides the dominant native basin of attraction (NBA), several competing 
basins of attraction (CBA). Theoretical studies of folding kinetics in such a complex energy 
landscape suggest that, in general, the folding of biomolecules takes place by multiple path- 
ways rather than by a hierarchical organization. Recent experiments lend support to the 
" new view" of the folding of biomolecules . 

One of the important theoretical predictions is that the mechanisms of protein folding 
can be varied depending not only on the intrinsic sequence properties but also on external 
conditions (such as pH, salt concentration, viscosity etc.) The theoretical studies to 

date have focused on the temperature dependence of folding rates using minimal models of 
proteins. The purpose of this paper is to examine the dependence of the rates of protein 
folding on viscosity, rj, (or equivalently the friction coefficient () and to provide a picture 
of the folding process in terms of the free energy landscape classified using NBA and CBA. 
Although there are a few experimental studies that have probed the dependence of the 
folding rates on viscosity they have not been systematic enough to reveal the underlying 
folding mechanisms. 

We use continuum minimal model representation of the polypeptide chain and Langevin 
dynamics to compute the folding rates as a function of viscosity. The major results of this 
study, which were obtained by examining four sequences each with either a /3-sheet or a- 
helix as the native state, are: 

(a) The folding rate kp for the formation of a /3-sheet or a a-helix increases linearly with 
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t] at low viscosities reaching a maximum at moderate values of t] and starts to decrease as 
l/rj at higher viscosities. 

(b) By assuming that the typical free energy barrier to folding scales as ^fMksTs (T^ 
is the simulation temperature and M is the number of beads in the sequence) we find that 
the Kramers expression for barrier crossing |Q (with the frequencies at the bottom and 
at the barrier height of an appropriate reaction coordinate as adjustable parameters) gives 
a quantitative fit of the simulation results. This implies that, at least for small proteins 
with simple native state topology, a low dimensional (or even one) reaction coordinate can 
adequately describe the folding process |[T0|| . 

(c) For fast folding sequences, i.e. those that essentially display two-state folding kinetics, 
the fraction of molecules that reaches the native state rapidly without being trapped in 
CBA, namely, the partition factor $ is close to unity and is independent of viscosity. For 
slow and moderate folders $ depends on rj implying that viscosity can be used to alter the 
mechanisms of protein folding. 

The polypeptide chain is modeled by a sequence of M connected beads, each of them 



corresponding to, perhaps, a blob of actual a-carbons [O. The chain conformation is 
determined by the vectors {ri}, i = 1,2...M. Although real polypeptide sequences are 
made from twenty amino acids, it has been shown that a three letter code sequences (i.e., 
sequences of residues of three types) can faithfully mimic certain properties of real proteins 
||TT| . Accordingly, we assume that protein sequence is made of hydrophobic (B) , hydrophilic 
(L) , and neutral ( A^) residues. In these models a sequence is specified by the precise way in 
which B, L, and A^ beads are connected together. 

Following our earlier work the energy of a conformation is taken to be the sum of 
bond-stretch potential, bond-angle potential, potential associated with the dihedral angle 
degrees of freedom and non-bonded potential, which is responsible for tertiary interactions. 
The details of the potentials and the compositions of the three sequences labeled E, G, and 



I with a /5-sheet as the native state are given elsewhere W^. The potential energy function 



and sequence composition for sequence H with the a-helix as the native conformation are 



the same as in the previous study [|T2|. The parameters in the dihedral angle potential 
V{(f)) = Aif,{l — cos(l)) + B^{l+cos3(j)) + C^{l — sin(f)) are taken to be = 1.0 e^, = 1.6 eh, 
= 2.0 eh, respectively, where the parameter e^^ ^ (1 — 2) kcal/mol is the average strength 



of the hydrophobic interaction. These parameters differ from the ones used earlier |]T2[. The 
native conformations for sequences I and H, which are determined by the methods described 
ni , are displayed in Fig. (1). 
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The choice of sequences was dictated by the following considerations. It has been shown 



Tl| that the kinetic accessibility and the associated thermodynamic stability of the native 
conformation for minimal protein models correlate extremely well with a = {Tg — TF)/Tg 
(0 ^ cr ^ 1), where the two characteristic temperatures intrinsic to the sequence, Tg and 
Tp, are the collapse and the folding transition temperatures, respectively. Sequences with 
relatively small values of a reach the native conformation very rapidly without being trapped 
in any detectable intermediates, whereas those sequences with large a have several CBA that 
act as kinetic traps |[Tl]]. Two of the sequences, G and I, have relatively small values of a 
|TT[ (0.20 and 0.14, respectively) and hence are fast folders, which implies that in excess of 
90% of the molecules reach the native conformation on the time scale in which collapse and 
formation of the native state are almost synchronous Sequence E is a moderate folder, 
while H is a slow folder (with a = 0.39 and 0.75, respectively). Since these four sequences 
involve the two common structural motifs in proteins and span a range of a meaningful 
conclusions regarding the viscosity dependence of small proteins can be drawn. 

We assume that Langevin equation provides an adequate description of the polypeptide 
chain dynamics. Since our goal is to study the dependence of the folding rate as a function 
of viscosity over a wide range, we are forced to use different algorithms depending on the 
precise value of the viscosity rj or the friction coefficient C(= GTrrja). In the low ( limit, 
corresponding to the energy diffusion regime, the inertial terms are important and we use 
the noisy molecular dynamics |[Tl|]. At higher values of ( we use the Brownian dynamics 
algorithm of Ermak and McCammon |TB[. We have verified that both the algorithms give 



identical results for folding rates in the intermediate range of (. For the sake of consistency 



we measure time in units of tl = {ma^ /ehY^'^, where m is the mass of a bead and a is the 
bond length between successive beads. 

The external conditions in the simulations are ( and the temperature T. Since we focus 
here on the variation of the folding rate with ( it is desirable to choose sequence dependent 
simulation temperatures Tg so that the extent of the native conformation (given by the 
structural overlap function x, which measures the similarity of a given conformation to the 
native state |]1T|) is the same for all the sequences. The simulation temperature, T^, is 
chosen so that (i) Tg < Tp and (ii) < x{Ts) >= ol be the same for all sequences with a given 
native topology. The condition < Tp ensures that the native conformation has the largest 
occupation probability, while the second condition (ii) allows us to subject the sequences 
to similar folding conditions. With the assumption that a = 0.26, Ts (measured in the 
units of eh) turns out to be 0.29, 0.37, and 0.41 for sequences E, I, and G, respectively [[TTl . 



For sequence H we took a to be 0.32, and the resulting Tg is 0.24 (details to be published 
elsewhere). 

The folding rate is calculated as 

-1 Nmax 1 

= — E- (1) 

^ "max ' li 

where tu is the first passage time (the first time a given trajectory reaches the native 
conformation) for the trajectory i and N^ax is the maximum number of trajectories used in 
the simulations. The value of N^ax ranges from 200-600, depending on ( and the sequence, 
which gives well converged results for kp. In Fig. (2) we plot the ratio kp/kpsT as a function 
of (, where kpsT is the transition state estimate of the folding rate. The calculation of kpsT 
is described in the caption to Fig. (2). The top, middle, and lower panels correspond to 
sequences G, E, and H, respectively. It is clear that the folding rate increases (roughly 
linearly) at low ( and decreases as at higher viscosity. There is a maximum at moderate 
values of (. 

The remarkable similarity of the dependence of the rate of folding to the predictions of 
Kramers theory for barrier crossing suggests that for proteins with simple native state 



topology a suitable one-dimensional reaction coordinate suffices. In fact, the simple one- 
dimensional Kramers theory can be used to analyze our results quantitatively. According 
to Kramers theory the rate for barrier crossing in the moderate to high viscosity regime is 
given by P,ll^ 



where AF is the typical barrier height that the protein overcomes en route to the native 
state, ua and ub are the frequencies at the minimum and saddle (transition state |]17| ) 
points of a suitable undetermined one-dimensional reaction coordinate describing the folding 
process. One of us has argued that the typical free energy barrier in the folding process 



scales as AF ~ y MTs [§,|T^,|1^]. If this result is used then there are two parameters in kKR-, 
namely oja/ojb and ujb, that can be used to fit the simulation results. The solid lines in Fig. 
(2) show the results of such a fit. It is clear that Eq. (0) fits the data quantitatively. The 
numerical values of oja and ojb (see caption to Fig. (2)) for the four sequences (data for 
sequence I not shown) suggest that the barriers for slow folders are, in general, flatter than 
for fast folders. 

The quantitative description of the simulation results by the Kramers theory shows that 
(a) the assumption of a one-dimensional reaction coordinate for folding of proteins with 
simple native state topology is appropriate (Furthermore, it appears that in the energy 
diffusion regime an appropriate folding reaction coordinate couples rather weakly to the 
other degrees of freedom. This is supported by the notion that in nearly all the sequences 
examined once a critical number of native contacts is established a rapid acquisition of the 
native state takes place); (b) The typical barriers in the folding process scales sublinearly 
with M, the number of amino acids, and is adequately given by y/MTg] (c) The transition 
state estimate for folding rate in the viscosity regime of experimental interest is at least two 
orders of magnitude less than the actual rate. 

In our earlier studies we have shown that due to topological frustration the refolding 
of proteins follows the kinetic partitioning mechanism (KPM) ||^,^,^. It is of interest to 



compute the partition factor $, which gives the yield of native molecules that arrive rapidly 
without being trapped in any intermediate, as a function of Using the distribution of first 



passage times $ can be easily obtained [|rT|. The partition factor $ shows no significant 
variation for the fast folding sequences G and I (data not shown). The dependence of 
$ on for sequence E is displayed in Fig. (3). Similar results are obtained for the a- 
helix. This figure shows that, as suggested earlier PJ^, the generic feature of KPM, namely, 
that for foldable sequences a fraction of molecules $ (determined by cr) reaches the native 
conformation rapidly, remains valid for all values of Just as the rate of folding itself $ also 
shows a non-monotonic behavior. Although there is no systematic trend in the variation of 
$ with the sequence E tends to behave as a fast folder ($ ^ 0.9) at the higher and lower 
values of C- 

The large variation of $ with ^ for moderate and slow folders suggests that pathways by 
which the polypeptide chain reaches the native state can be altered significantly by changing 
77. In order to probe this we generated two hundred distinct initial conditions at C = 5.0 
for sequence E. At this value of C we find $ ~ 0.8, and accordingly we determined that 
there are 44 trajectories that get trapped in the CBAs. Using exactly the same initial 
conditions we altered C to 0.16 and discovered that out of the 44 slow folding trajectories 20 
of them became fast folding indicating the dramatic change in pathways with the alteration 
of viscosity. Similar results were obtained for the 156 fast folding trajectories at the higher 
C,. Note that $ is a dynamic quantity and the conservation of the number of denaturated 
molecules only requires that the sum of amplitudes of fast and all the slow folding phases be 
constant. An important experimental consequence of this result is that the folding scenarios 
can also be dramatically changed by varying so that a sequence that appears to be the 
fast folder at one value of C may be a moderate folder at a different viscosity. 

The results in Fig. (2) may be used to obtain time scales of folding of small proteins. 
The friction coefficient C corresponding to water (r^ = 0.01 Poise at T = 25°C) is roughly 
Gnrja and this corresponds to C ~ 50 in reduced units with a ~ 5 A. In this range of ( values 
the inertial terms are irrelevant and the natural measure of time is th = (a'^/kBTg, which 



for water turns out to be about 3 ns [jTT|. Using this mapping we find that the time constant 
for the formation of /3-sheet ranges from (0.03 — 0.1) ms depending on the value of a. A 
similar calculation for sequence H shows that the time scale for the formation of a short 
a-helix, containing about four turns, is about 10 yus. It should be noted that the a-helix in 
our study is not well optimized, since a = 0.75 is relatively large. Well optimized sequences 
have a values in the range of 0.3 or less [|ri|]. If we assume that folding time scales roughly 
as p then we predict that an optimized helical sequence with four turns would fold in 
almost 500 ns or 0.5 fis. 

Our theoretical predictions can be verified by experiments on folding kinetics in the 
viscosity regime rj^ater < ''7 < ^Orj^ater- The prediction that yield of the fast folding process 
(namely, $) for moderate and slow folders can be drastically altered by changing viscosity 
is amenable to experimental tests. 
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FIGURES 



Fig. 1. The conformations of the native state for sequences I (/?-sheet, left panel) and 
H (a-helix, right panel). The sequences consist of hydrophobic B residues (shown in blue), 
hydrophilic L residues (shown in red), and neutral residues (shown in grey). In the 
/3-sheet neutral residues are near the loop region, where the dihedral angles can adopt 
(~ 60°), t(~ 180°), or (?^(~ —60°) positions, where g~^, t, and g~ are the three minima in the 
dihedral angle potential [|ll|]. The dihedral angles in the /?-strands are in the t conformation. 
The /3-sheet is stabilized by the attractive hydrophobic B — B interactions. All the dihedral 
angles in helical structure are in positions. The number of beads (residues) in one turn 
of the helix is 3.9. These structural features are in accord with those seen in real proteins. 
This figure is located at [http: / /www. glue. umd.edu/~klimov/seq_I_H. html . 

Fig. 2. The ratio of the folding rate to that of the transition state value as a function of C,. 
The top, middle, and the bottom panels correspond to sequences G, E, and H, respectively. 
Sequences G and E have a /?-sheet as a native state and the native conformation for sequence 
H is an a-helix. The solid lines are the fit using the Kramers expression for barrier crossing 
(cf. Eq. (0)). The free energy barrier in Eq. (^ is taken to be v^MT^ so that to a and uob are 
used as fitting parameters. From the best fit the transition state estimate of the folding rate 
is calculated using kxsT = iOAGxp{—\fM) /2Tt . For sequences G and H the fit is done using 
nine data points with C, > 0.16, while for sequence E ten data points are used with C, > 0.05. 
The most accurate least squares fit for sequences G, E, and H give [1.86 (0.01), 2.47(0.01)], 
[1.95 (0.08), 1.43 (0.03)], and [3.78 (0.02), 0.75 (0.002)], respectively. The set of the numbers 
in the square brackets corresponds to ua and ub and the numbers in the parenthesis are 
error estimates. Viscosity for water gives ( = 50. 
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Fig. 3. The partition factor which gives the fraction of fast folding trajectories, as a 
function of C for sequence E (a moderate folder). Folding pathways strongly depend on C. 
leading to dramatic variation in $. For example, at ^ = 500 this sequence may be classified 
as fast folder ($ > 0.9), while at all other C it is a moderate folder. There is no dependence 
of $ on C for fast folding sequences (data not shown) . 
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This figure "figl.gif" is available in "gif" format from: 



http://arXiv.org/ps/cond-mat/9705309vl 



